[ZEPPELIN-6419] Fix clone paragraph content loss caused by overly aggressive shortCircuit seq filtering#5254
[ZEPPELIN-6419] Fix clone paragraph content loss caused by overly aggressive shortCircuit seq filtering#5254kevinjmh wants to merge 2 commits into
Conversation
…ressive shortCircuit seq filtering
There was a problem hiding this comment.
Thanks for tracking this down. I've been going through both the client and server side and wanted to share a few observations worth weighing before landing. I might be missing context on some of these, so please correct me where I'm wrong.
1. The new tracking Set may not be populated under current code paths
The only caller of shortCircuit() I could find is runParagraph() (message.ts:432), which invokes it with op: OP.PARAGRAPH_STATUS and no msgId. The guard in the new shortCircuit():
if (message.op === OP.PARAGRAPH && message.msgId) { ... add to set ... }doesn't appear to match in that path, so shortCircuitedParagraphMsgIds would stay empty during normal usage. If that's right, the receive-side filter would always pass through, and the runtime effect of this PR becomes "PARAGRAPH filter is bypassed." That does fix clone loss, but the Set / cap / eviction logic could end up being dead in practice. Worth checking whether there's a future caller you have in mind.
2. The 2020 heuristic may have been targeting something different from what it actually dropped
The server emits OP.PARAGRAPH in two regimes:
With echoed msgId (visible to the filter):
COMMIT_PARAGRAPH:NotebookServer.java:1095PARAGRAPH_CLEAR_OUTPUT:NotebookServer.java:1241COPY_PARAGRAPH:NotebookServer.java:1464callsupdateParagraphwhich broadcasts at:1095. This seems to be the clone-loss path.- Personalized-mode
RUN_PARAGRAPHunicast:NotebookServer.java:1549
Without msgId (bypassed by if (!message.msgId) return true):
- Paragraph execution lifecycle (PENDING / RUNNING / FINISHED):
NotebookServer.java:2008usesMSG_ID_NOT_DEFINED - Background output clears:
:1756 runAllParagraphsfallback::1499
Non-personalized runParagraph (NotebookServer.java:1528-1565) doesn't seem to echo OP.PARAGRAPH with msgId from onSuccess. The status transitions arrive via the lifecycle path without msgId and bypass the filter anyway. The server path may have looked different at the time ZEPPELIN-4985 was introduced, so the heuristic might have been more accurate originally.
3. Suggestion: keep shortCircuit, but drop the dedup branch
The instant-PENDING feedback from shortCircuit({op: OP.PARAGRAPH_STATUS, ...}) is routed to onParagraphStatus (paragraph-base.ts:81), so it's independent of the PARAGRAPH filter. And paragraph-base.ts:218 isUpdateRequired() looks like it would absorb at least some of the no-op updates from redundant echoes. Given those two, dropping the PARAGRAPH filter branch entirely might be a simpler fix than rebuilding the dedup on top of the Set.
Roughly -21 / +0 in message.ts:
receive<K extends keyof MessageReceiveDataTypeMap>(op: K): Observable<...> {
return this.received$.pipe(
filter(message => message.op === op),
map(message => message.data)
) as Observable<...>;
}
shortCircuit(message: WebSocketMessage<MessageReceiveDataTypeMap>) {
this.received$.next(this.interceptReceived(message));
}Plus removing the MAX_SHORT_CIRCUIT_SIZE constant and shortCircuitedParagraphMsgIds field.
4. Local check
With an artificial Thread.sleep on the server's paragraph run path:
- Clone preserves text correctly
- PENDING / RUNNING / FINISHED transitions remain instant through the existing
shortCircuit(PARAGRAPH_STATUS)path
I might be missing scenarios the original code was trying to handle, so feel free to push back if anything here doesn't add up.
|
@jongyoul @voidmatcha Could you take a look at this when you get a chance? I only verified the clone preservation and PENDING / RUNNING / FINISHED transition scenarios locally, and there might be other cases the original filter was trying to handle that I didn't think to check. |
What is this PR for?
Problem
When cloning a paragraph in zeppelin-web-angular, the cloned paragraph only contains the interpreter binding (e.g., %mysql) but not the actual editor content (e.g., select 1 a, 2 a). The old zeppelin-web handles this correctly.
Root Cause Analysis
The backend copyParagraph is a two-step operation:
Step 2's PARAGRAPH response gets silently discarded by the frontend shortCircuit mechanism in message.ts. The original filter logic compares the currently-sent message sequence number against the received response's sequence number:
The problem: between sending COPY_PARAGRAPH (seq=49) and receiving its PARAGRAPH response, other unrelated messages like EDITOR_SETTING (seq=50) may be sent. This makes lastMsgIdSeqSent (50) > msgIdSeqReceived (49), causing the legitimate PARAGRAPH response for the cloned paragraph to be incorrectly filtered out.
Solution
Replace the implicit sequence-number comparison with an explicit tracking set (shortCircuitedParagraphMsgIds). Only messages that were explicitly passed to shortCircuit() get filtered — not all messages where lastMsgIdSeqSent > receivedSeq.
What type of PR is it?
Bug Fix
Todos
What is the Jira issue?
How should this be tested?
Screenshots (if appropriate)
Questions: